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ON THE ROAD TO AfiSEfiMSCf DEEPER l.EARMNG: 
WHAT DIRECTION DO TE&T BLUEPRINTS PROVIDE? 


JcMJi L Deborah La Torre Malnuidola^ and J)a Wajig 

CRESST/Ucuversity of California, Angelea 

Abstract 

Thjs study examines the e:tlent to which de e per leamuig is expecled to be present in 
the new collej^ and career ready (CCR) alandanli This is done by examirung the 
diilribulion of items ajid taskb at hi)(h levels ofco^itive demand (DOK3 and D0K4) 
in the suJTirTMtrve tosi blueprints developed by the Partnership for Assevsmenl of 
Readiness tor College and Careers (PARCC) and the Smarter Balanced Assessment 
Consortium (Smarter Balanced). The study found that while only of the 

consortia's assessment items and tasks appear lo require higher levels of cognitive 
demand, approsimalely iC^6% of the lotal possible raw scores are allocated to 
deeper learning Furthermore^ the anal)ses indicaled that while the en^of^ycar 
(EOY) exams are focused on relatively lower level items, components of ihe 
performance tasks primarily concentrate on deeper learning and higher levels of 
thinking. If the consortia maintain the levels of cognilive demand specified in their 
bluepruils^ there is no doubt thal this will resull ui an increase in inlelleclual demand 
Irom pnor state tests. 

intruductioo 

Elsies, distncls^ and schools across ihe country have been preparing for new tests of college 
and career ready standards. Last year's field teMing^ conducted by both PARCC and Smarter 
Balanced, provided a general sense of the increased demands these new tests will bnng '‘Wow, 
this is hard^'" seemed a common studeni refrain baaed on media reports. For some, the 
exclamation marked dismay, for others il marked pleasure in being challenged to think and 
solve problems in new ways. 

Preparing for these new e:tpeclaUons, however, requires more than emotional reactions and 
more than a general sense thal they are "harder What is needed is a b e tt e r understanduig of the 
ways ui which these tesis will be more challenging and, particularly, the extent to which they 
will assess deeper learning. In this report, we use the Partnership for Assessment of Readiness 
for College and Careers and the Smarter Balanced Assessment Consortium teu blueprints to 
forecast the deeper leamuig challenges that the new tests will bnng and suggest ways in which 
this analysis might inform curriculum and instruction 

We start by sharing the concept of deeper learning and the metric we are using to gauge its 
representation We then describe how publicly available te^t bluepnnts enable us to predict how 



deeper learning likely will be djstnbuled on PARCC and Scrurler Balanced operational testk 
FinaJly^ we share the resulls of our anaJysis and bug^est unplicalions Idr curriculum and 
iji struct I on. 

Background on Deeper Learainit 

Deeper learrung )s the concept we use lo capture (he major changes )n learning expeclUons 
thal Ic^lay's new college and career standards for English language arts l£LA)^ marhentatics^ and 
science are inlerwjed lo embody These siarwjards reflect a general consensus that to be prepared 
for success m college and work^ students need to develop deeper content knowledge and be 
bener able both to apply their knowledge lo (hirOc critically and solve compleit problems and to 
communicate (heir knowledge and skills with others {William and Flora Hewlett Foundation. 
nA) These capabilities are the essence of deeper learning as defined by (he new standards.' 
They also characlenae the nature of academic knowledge and skills that our new tests must 
address to be valid measures of and lo remtdree the development of the new standards 

How does one determine how well deeper leomuig is addressed in the new CCR tests? We 
are asing Norman Webb's Depriwif^ICnowledge (DOK) classification scheme (see Webb^ All^ 
Ely^ & Vesperman^ 2005; http //wat wceruw.org) to make a determinauorv becauae his scheme 
has been used in prior studies of state tesis and (hus enables an easy comparison between the new 
tests and prior practice The scheme uses (he following four levels to characterise ihe DOK and 
thinking required to respond lo an item 

» DOK I Recall of a fact term, concept or procedure; basic comprehension 

» DOK2 Application of concepts and/or procedures involving some menial processuig 

a DOK3 Applicatuins requiring abstract thinking, reasonings arvd/or more complex 
inferences 

a DOK4 Exlcrvded analysis or invesOgalion that requires synthesis and analysis across 
mulliple contents and nomrouline problems and applicalions 

We have argued elsewhere (hat DOiO arvd DOK4 represent important aspects of deeper 
learnings because to answer ilems or tasks at these levels students have lo apply and synthesize 
their knowledge and engage ui critical thinking and reasonuig (Herman & Linrt 2015). Further, 
both levels have been grossly underrepresented ui most prior state tests. For example^ RAND^s 
analysis of the DOK assessed in released items and tests fn>m the 1 7 states repuled to have the 
mosi challenging state assessments showed that virtually all of the selected and constructed 


'As delimJ by il>r Willioe a/sd Man Hesvleii ho«id9iior> (rAy deeper lavrart^ ako irxlate cerumcis l^uscjis 
un cvll^niiorv s«(f*(Jincitfd arJ seateoic mn^ti ifui m not ^rrcily in ifw 9iardard&. rar 

an ifwy laisris Tar caber the PAftCC or bmanrr t&abrMd end*of*years«Qmdlivs asr^rwu 



response Hera )n mathenialics were cale^need eb DOKI or DOK2^ wHh similar revolts tor the 
selected respor^^e itecna of readoig and wntm^ lYoan & 2012) The viruahon was b e tt e r tor 

coretructed response items for those stales that incluckd such ilems with more than half the 
coretructed resportse reedoig tasks at or above DOK3^ and Ibr the eij^ states that djreclJy 
assessed wnUrtg^ the wnOng prompts were nearly uniformly classified at DOK3 or DOK4 

4 Quick Review of Evide nce^Centered AsM^ssmeot Design 

To understand how blueprints currenlJy offer an advance view of whal the assessments 
from RARCC and Smarter Balanced will assesb^ consider the evidence^centered desi^ (ECD) 
process that both are uain^ to develop their systems ECD starts with ihe basic premise lhat 
asbevsmeni is a process of reasoning from evidence to evaluate specific claims abool student 
capability In essence^ student responses to assessment Hems and tasks provide the evidence tor 
the reasoning process, and psychometric and other validity analyses establish the suHiciency of 


the evidence for vubsiantiating each claim (see 

Pelle^no, Chudowsky, & Gla«et, 2001). 

PARCC 

Satan* HalanceJ 

1 Kndiag: dludenia r««d«/,d «OfTyKvtieod4 cafis« of 
sufflrirfiUy b'Dn«le> u«ia iwfcfenkrwty 

1 SuiilerMs can tloMly aivJ analyiic'ally lo 

cumpr«lwnd a range of aiowsr^lycaaplen liieraiy 
and aiformaiional leaB 

2 >tiling: Siuteils eiTccusvIy »lten 

vdioc nalyjing Murtra 

2 ^\n^ag: Mudmia «Ani«odif«e eiTetuve and 
grounded 'pniing fee a nrtga of pupoae' and 
auAaicaa 

* KnetKb; SiDderMt buikl and pre««rti InOMladgr 
Uinxiglt reseanli aitd 
■id <77iil«M of idea< 

2 ^>«aU■s aad Llaleniag: Stivdrfiia can employ 
elTo; uv« and Isiening Ailb Cm a lange of 

puafiuMsand aoJirticen 


a Hesearcb^laqiary: SiudaiBcan enip^ m 

nAearvli^in«|inry loinvesiigaie top*u, and lu anatyiA. 
irMegraie, and preset^ inronoauai 


) nAltCC* aivJ brrwn^ UibAead civms for the bLA Su wisiivr s 


The ECD process siartv with a clear delineation of the claims that are to be evaluated and 
the evidence that can be used to substantiate the claims, which provides a clear and transparent 
foundation for asse ss m en t development Both PARCC and Smarter Balanced have reorganized 
the Common Core State Standards into core claims about sfudenl competency m ELA and 
mathematics that their tests are designed to evaluate. Both start with an overall claim about 
students becoming college and career ready and then vubdivide these overall eitpectations into 
more vpecific subclaims for ELA and mathematics Figure I and Figure 2 summarize the 
PARCC and Snwjter Balanced claims for both subject areas (We return later to an analysis of 
these claims,) 
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PARCC 


HalonceJ 


I Major Conical witfe C oaarrboat to I'racUoa: fTie I C'oncrpla aad PraccUurta: biudenu cv capbm aruJ 
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SiaManfe lor MaihrrMUcal ^iKUce lliierary 

J AdtUlkunl aad ^tfOp«tliag<*<lnl«at witb 2 frobleat Siodems can »>1i« a rar^ooT 

C oaarrUoat lo PraOma: Hia aMidaniaolves complo ueJt^aodpaoblvu m pi»« wtd applied 

prsbims in<oKir^ilw AiUiiiut«I «ul bupponing nMiliemaii«9. maimi; produciiie u«e of krM« ledge 

Coniertf iocibe gradccoum ailhciaineciioruio ihe and proWem aoking siraiagies 
Sianianb Cvt MadirtTMiacal Pneiaca. 

’ HigbligbleO PrarUm Mf.3 aad 0 with 'V Conraanicating KnMuang: Siuderdscan clearly 

( aaaerUoat lo CoMeal letpraMing raihaitaiical artd pr«ci«l} cunsuiai c«ble agumenia lo 
reasoning 1 }ia »luJanie<ipm'«as|{raJecour>e level ihairooi’n raasiaiiogand aocMccpie ibe reatfoingor 
^rvpeiaie n«ilwmaiial reaMnsig brciaisimciaig oditfa 

viable ar^metiia, cnuifiir^ibe feBumngoroUwra 
aidtoraiaenArigio piacbaai v»hen makaig 
nadierrMiical aaiemetiia 

4 HigbligbleO PrarOre MPJ witb ( oaaerlioas lo 4 Modcbng aad Uala Aaalyas: Sludrtiia caoMlyce 
{ aaum lamdeliap'appliaiiiMi)* I be siwdera aolias complo, real«««(IJ »ceruno< and can coiMmci aid 
rcil'V/urkl prublcTTc v/iip uC dilTkiiliy uae maiKemaiical loodakio mierprei andoolva 

^ t p i utf aiaie lo itia ^rada'coone by applying proUams 

bnov>ledgeand akilb aiaculaied ai Uw siaidards tot 
Uie cairreni giafe'coum <or Ibr more eon^a 
prsbianw; kno^vladge and AilbanjCulaied in die 
»(andafds Ibr pienoua ^radavc'Otins), anga^ng 
panacularly m iKe Mudeln^ practice, aid »liera 
halprol frying eeue ofpfoblenK andpaaecanngio 
solve Uiem. reaaonaig ^dradly ard auanliiauvely, 
using appeopnaie iooJ> siraiagially, looking Ibr 
raaking ida ofaimciua.and/vc lookup for and 
aipresangiegulaitiy ai lepeaied reasoning 

i Haeary: THe sioJeni danandraies tVency msaas 
sei ibndi in ihe Siardards fee Comaii m grades i-6 

figort 2 PAKCC aid bnanei BaUrvcedcbinM focUie maibemaucn summaiiva aaassDenis 

Each clajm is ^nher defmed by specific evidence slalements {PAECC) or assedment 
Ur^Ls (Smanec BaJanced) thal the claim encapsulates These slatements or lacgels essenliaJly 
represent paniculai Common Core standards or clusters of standards, and for Smarter Balanced 
also indicate the DOK level at which each tar^ may be assessed 

The targets and evidence statemenis become the subjects of item or task specifications The 
specifications provide guidance and rules for item writers to follow in developing items that 
address each assessment target or evidence sialemeni The ideal specification provides sufficient 
guidance so that two item writers working independently from the same specification would 
generate essentially comparable items or tasks tor a given assessment target or evidence 
statement — such that students would be expected to perform similarly on both 





Item wnler^ (hen y^se (he specifications to ^nenle Hems and (asks^ which in rum are 
subjecled to conteni and bias reviews as well as pilot tesimg Ilem5 and lasks (hat Minnie this 
process as substantively and psychonKtrically sourwj are (hen assi^ed to test forms according (o 
bluepnnb. These blueprin(s provide (he rules for assembling Hems that will be ^idminislered (o 
students so tha( the operational test Idrms will adequately represent tbe clajms and ractge of 
evidence requjred (o draw valid inferences about student proficiency relaOve (o (he clajms Test 
fbrmv are then field levied and addilionaJ reliability and validiry srudies conducted 

Test Blucpnols; Plans tor Avsemhhog Items Into Test Forms 

Although the PAUCC and Smarter BalaiKed operational lests were implemented for (he 
first time in spring 2015^ the blueprinb they used to create the lest terms were made available to 
the public in previous years (see hrtpA'www.parccorOme.or^assesvments^est«design/ela« 
literac>*levt«specifica(ionvdocuments« hrtp parcconline org^issessmemvtest^esigrv* 

mathemaOci'nuth^lest^specificalions^ocuments^ hnp /*a»ww smarterhal 0 nced.org* smarter^ 
balanccd^ssessmenb) " The blueprints lay out the standards conteni represented m each test 
form^ by grade levels and grve some indication of the depth of knowledge at which the 
assessment targets will addressed 

Bluepnni formats and specifications vary for the two cortsorlia^ in part because of 
diflerences in their on^Iemand test designs. Allhough both systems irtclude both ertd^f^year 
(EOY) orwdemand and performance assessment components^ (he omdemand tests from Snurter 
Balaraed unliae compuler^aptive lestirtg (CATk which essentially individualises the items that 
are odminisiered to siudentc based on their prw responses. PARCC^ in contrasty will use fixed 
form assessments^ which are common across studenut. 

The Smarter Balanced blaeprmut are organized by claim area and specif the number of 
CAT and/or performance assessrirent items that will be included in each content category (e.g.^ 
literary versus informational lext wrthin reading] and indicate the number of items that will 
address each assessment larget or group of assessment targets The blueprints indicate the depth 
of knowledge al which each assessment larget can be assessed— more than one level can be 
specified— and establish a minimum number of items at DOK3 and/or DOK4 for each tesl^ 
among other details 

PABCC provides similar levels of detail in its English language arts blueprints termed 
COTimon Forms SpecifWations but organizes them by task type (e.g . literature analysis^ 


\)ngirul uulyser fur ibis report were cceopleied usag eaHier Sralt %ereieru of llv TAItCC (201^) m6 ^Odrtar 
dalaiiced ( 20 U) blaepnau tor p«pQM9 of iliis report ottlyM were r^oae using ihe reoenlly released 201 S 
verSHnsordie fAACC* Smarter BaJanted tLA arvd raihenuiics blaepnnis 
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rcsearrh). CogniU^e C4jmplexity howev^r^ are rxM yet provided to ihe public. In 

malheniaUcs^ PARCC uw its three lask iypes to or^nae itb Hi^h Level Blueprints by ^rude. 
which in turn are linked lo the claims that RARCC^s assessment is designed to evaluate Type 1 
Items or tasks are mlended lo assess basic mathemalicaJ concepts skills and procedures; Type II 
are intended lo assess mathematical reasoning and ask lur written arguments and jwitificalions; 
and Type III math tasks are intended lo assess mcdeling and real world problem solving 
applications. Although Ihe blueprints du nut specify DOK levels or olher cognilise cumpleitity 
distribution^ it is possible to infer them from the description of task types and sample Hems 
PARCC provides.^ We will discuss this further m our analysis 

Drptb-o^Koowkdge Espectatioas Esidcocvia Blucpriols 

We used the intormation in the blueprints to estimate the eitlent to which deeper learning 
will be represented m the Smarter Balanced and PARCO summali^e assessments^ using two 
different but related metrics. The firet is the proportion of items or tasks that are likely to be at 
DOKi or DOK4. The second is the proportion of the total raw score value that will be accounted 
for by Items and/ or tasks at DOK3 or DOK4 The latter value lakes into account the higher score 
values that are oRen associated with tasks at higher levels of compleitHy^ tor example those that 
call for an explanation andor more extended performance tasks Data are r e ported by claim^ 
where possible^ as this is the level at which individual scores will be reported based on current 
plans. 

Smarter Balaacrd ELA Summatiw Assessments 

Table I shows the dismbution of DOK3 andor DOK4 Hems for Smarter Balanced ELA 
CAT and per f ormance task asse ss m e nts for readings writings speaking and listenings and research 
clauns Because the dislnbulions are identical across all grades for the performance task 
components and neariy so tor the CAT component^ this table shows the mean distribution across 
elementary^ middle^ and high school {see Appendix tor results for each grade span) All rtems 
and tasks include given slimuJi (e.g.^ Irterary tests^ informational texts) and may involve the 
analysis of multiple texis Although we focus in this report on the DOK tindings^ the data 
proside important content information. Forty percent of the CAT Hems will address the reading 
claun and its consiituent assessment targets while wnlmg and research will be the focus of the 
performance tasks. As would be expecled^ the data also indicate thal a relatively small proportion 
of the Cat Hems are at or above DOK3^ because ihe CAT format is nol conducive to the kinds 


^Sae Imp parccDalma arJ Imp pa/tL0rtltfiaarg/m9ify*fldi Tor of Ow f AIUX* 

C'otruii\r Canplc^il/ IrsfVNvorks Tor ULA^iiMB/aAJ MailieMics 



of extended response (asks lhal rypically address hj^her DOK leveii In contrasty ihe 
performance t^isk component is cent e red on these hi^er level applications. 

Tdble I 


Sffutfftf LL^ A^au CroJtf, of items by Cfotm, 

Idtf/. Mmtmtim ot iM)KSM 


Claim 

Mcdn V iMta 

% Total Items 
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lU 

40H 

23 

14% 

Cliim2' Wnung 

I0J> 

2«H 

1.0 

10% 

CllimS Speafcr^aid lisiera^ 

9J> 

22% 

0.0 

0% 

Cllin4 

6J> 

15% 

0.0 

0% 

SiAwaal 


100% 

3.3 

8% 

IVrfcvTnanM u<k coi^nem 

Cbim 1 K«»linu 

0J> 

0% 

ma 

iva 

Cliim2' Wniing 

3J> 

30% 

3.0 

100% 

CllimJ Sp^ngaid lisienng 

0J> 

0% 

ma 

na 

Cllin4 tUsamli 

3J> 

SQ% 

20 

100% 

SiAwaal 

60 

100% 

00 

ioo% 


Abtt (Xicm^pei do tvA dilTerenuaie b<i»rtfi DOKJ m6 UOM brcMse rpeciiicaiMS^d rM corttisirnilf 
dinrrsenaie ihe is«o m»siinis miwoujou 


The represenUtion of DOK in En^ish langua^ arts^ however^ changes Mibstantially when 
one coreiders the proportion of total raw score points that will be baaed on these higher DOK 
levels The dif fe rence^ of course^ occurs because ilents that are nsore complex typically are worth 
more points^for eitample^ CAT items are often scored on a rubric from (^2 points (partially to 
felly correct], while essay nibnes ha\e more extended poinl scales. Based on the blueprints^ 
coupled with our analysis of ihe points associated with items and tasks in sample ilems and 
practice tesls^ we assume the following: (a) The CAT items at DOKJ are typically worth 2 points 
each (b] The writing task m the performance assessment will be worth 10 points (i e.^ 4 points 
for organization* psirpose. 4 points for evidence/elaborUorv and 2 points for conventions], (c) 
The performance assessment items addressing research will be worth t points Based on these 
assumptiona^ we estimate, for every studeni who lakes the assessment^ ihe following minimal 
proportions of their total possible raw score will be based on DOK3 and/or DOK4 
» Elementary school (Grades ^5) Minimally 
» Middle school (Grades d-8) Minimally 
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High school {Grades ^12): Micijnslly i9% 


As notwL (heso arc baaed on the crunimums thal Smarter &aJajiced has esiabhshed m its 
bluepnm specifications However^ Ibr some students^ the perrervujfes cn^y be much higher^ if 
based on their prior item performance^ the CAT al^nlhm assigns high-level items beyond the 
stated minimum 

Based on the bluepnrUs and the possible DOK levels at which each target could be 
assew<L we also esOmaled the ntaximujn percentage of total score points thal could be 
attributed to DOK3 and DOK4 items for a student Here we as^med that any target that might 
be awssed at Level 3 or 4 would b^^fbr example^ a target that was specified at DOK2 and 
DOK3 was assurrved to be assessed at DOK3, and a target specified at DOK3 or DOM was 
assumed lo be assessed al DOK4. Eased on these assumpliortf ^ more than two thirds of the lotal 
possible score could be based on DOK3 and'or DOK4 It is unlikely that any student will be 
assessed at the ntaximum^ but it is likely that students will be administered more than the 
minimum and that somewhere in the middle— approx unalely 50% of the total score— might be a 
reascrtable estunate. 

^maiicr Balaocrd Math ^ummatiw Assessments 

Table 2 shows the dislnbulion of DOK3 and DOK4 items for the Smarter Balanced math 
CAT and performance task components for the concepts and procedures^ problem solvuig and 
modelings and communicating reasoning claims.^ As with ELA^ becaase the disinbubore for the 
performance task componenb are identical across grades and those for the CAT component are 
very sunilar^ the table summartaes mean dislnbuOons across elementary^ middle^ arvd high school 
(see Appendix for results tor each grade span) 

Here^ as in BLA, we see less attention to DOK3 and DOK4 in the CAT component More 
specifically^ DOK3 and DOM are not represented tor Claim 1^ which corelitutes the majority of 
the Cat itents^ but are represented in sizable proportions tor the relatively tew iients addressing 
Claints 2/4 and 3 These latter claims are the fexus of the performance assessment component 
where half the items are at least at DOM. 

Agaui^ the picrure changes when we examine the proportion of total possible score points 
associated with Hems at the highest DOK levels, because DOM and DOK4 items and tasks are 
worth more points than those al lower DOK levels Based on the blueprints^ and coupled with 
our analysis of the points associated with items and lasks in sample Hems and practice tests^ we 
assume that CAT items al DOK3 will be worth 2 or 3 points esch^ with a mean of 2.5 We also 


UaluKed fiasccnhaadCbirtt 2 m6 4 Ibrtbr pujfMorsCMe rrparir^ 



ssuimc thal higher level on the pertorm^Jice t£ik component will be evenly spin between 
DOt3 and DOK4^ with an a\era]^ value of 3 points each With these assumptions in cnind^ we 
estimate that^ tor all studenis the following crunimal proportions of their Smaner Balanced math 
scores will be based on DOK3 and or DOK4 items 

» Elementary school {Grades ^5) Mminsally^SH 
» Middle school (Grades 6^K) Minimally 39% 

» High school {Grades 9-13): Minimally 3?% 


Table! 
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These proportions are very similar to those for ELA and again represent the minimum of 
whal every siudenl will be administered. The maximum that a student cf>uld see is considerably 
higher^^pprmimately 70% of all assessment largets designaled as possibly DOiO or DOK4 
were assessed at these highest levels. However^ as with the ELA« the IDcely maximum is lower. 

EARCC el a 8umma1ive Aisvsimcots 

Table 3 and Table 4 reorganize the data in the PARCC blueprints to display the number 
and percentage of items score points^ and total possible raw score points representing each of 
PARCC's major claints for hs ELA summalive assessmervt Since the PARCC blueprints do not 






jiclude e:(plicil DOK or C43^tive c^implexity specifications we u^e score points accorded to 
various Hem types as a rough indicalor 

As shown in Table the PAUCC EOY lest coiKenlrales on their reading claim and« across 
gradea^ each item is worth 2 points In Grades the assessmeni )s to be composed of 10 itene 
for a total of 20 poinut^ arvd for Grades ^12 there will be Id items for a tolal of 32 points W)ih 
aJlitema^ sruderUs are required lo read gisen texts. 
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Table 4 shows the distribution of PARCC's per f ormance assessment component by grade 
and claim At each grade level, the assessment is composed of three major tasks— one involving 
literary analysis of two given lexts. one involving a research simulation based on three given 
teitls^ and a narrative task based on orve short text Within each task type^ students respond to 
reading questions about each test tor 2 points^ and then respond to a prose constructed response 
(PCR) task The resulting essay is scored tor written expre ss ion, knowledge of language and 
conventions for literary analysis and research tasks^ and for readings* use of evidence. Depending 
on the grade level, total possible scores on PCR tasks for reading range trom 3 to 4 points 
(Grades 3^5 and Grades ^12^ respectively): for written expression fram 9 to 12 poinLs (Gredes 
^5 and Grades ^12^ respectively)^ and knowledge of language and conventions have a possible 
score of 3 poinLs. As with the Smarter Balanced performance tasks^ the PARCC pertormance 
component addresses research and writings but also incorporates significant attention lo its 
reading claim 
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values asbK lated with these item types by claim and grade level. These data indicate that 
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based on ilents and tasks that tap higMevel cognilive demands. This estinttte also takev into 
accounl thal approximately one third of a stiident^s total score on the tests should represenl each 
level on the PARCC cognitive compleaily Inmework for ELA^Lileracy (i.e ^ low^ mediunu high; 
PARCC^ personal commimication^ October 3^ 2014]. 
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PARCC Malbcnutic*^ Summatfve Assvasmeab 

Table 6 and Table 7 reor^ajiizs (he data )n (he PARCC math blueprinls by claim for the 
EOY and perf^rmajice aise^^menl components The tables display by claim and grade level the 
number of items plactned at each score value^ the total number of ilents and possible raw score 
points^ and the percentage of items expected at high levels of cognili^e complexity. Agajn^ in the 
absence of direct specification of cognitive complexity^ we use score value as a rough indicator. 
Similar to PARCC's EL A a^ses^ent^ we consider score values of three and above as addressing 
higher DOK levels and deeper learning 
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As Table 6 show^ (he EOY test concecmies on PARCC^s three mathematics claims: 

» Claim A: Major conlent with connections to praclices 
» Claim B: Add3ttonal and sup^wrtmg content with conneclions to [ncoces 
» Claim E Fluency 

As noted earl3er^ PAltCC organized theu math blueprint by lask type and do not 
differemiale (he number of items addre^s3ng each claim Insieod^ the blueprint specifies that 






Claims aivd E— the more baiic kciowledge^onanled standard s will all be addreeaed by 

Typt I nema They may aJso involve mathemaOcaJ practices. 

Given the emphasis on claim and task l>pe^ il is nol surprising lo see thal the EOY e:iam 
generally concenlrates on basic knowledge and applicalion levels However^ ihe e:iam does 
appear lo call on progressively more and some deeper applications as ihe srudent^s grade level 
advances. 

Table 7 reveals that PAiCCC's performance asscssmenl component addresses the following 
claims^ as well as the claims addressed in the EOY assessment: 

a Claim C: Highlighted practices MP i and 6 with connections to conlent (expressing 
mathematical reasoning) 

a Claim D Highhghled practice MP4 with connections lo content (modeling^ 
application) 

The per f ormance assessments are composed of previously defirved Type h II stid III Hems, 
with the latter two devoted to Claims C and D. As a reminder^ ^yp^ II HI hems likely align 
with DOK3. as at least half their point values are awarded based on the quality of student 
reasoning and/or modeling. 

Table 7 shows that based on ihe PARCC blueprint^ all items addressing Claims C arvd D 
will reflect de e per levels of learnings while those assessing Claims A^ B. and as eitpected. 
remain focused on basic knowledge and application. In lerms of the proportion of items^ those 
addressing basic knowledge and applications consiitute more than half the per f ormance 
assessment component while those communicating reasoning (Claim C) and modeling (Claim 
D) each draw less than a quarter of the rtems Allocations of higher level items are roughly 
similar for elementary and middle school grades^ but increase slightly at the high school levels 
because of the increased demands specified tor Algebra 2 and Math III courses. 
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A tnier picture of how much deeper learning counu^ )s found in the weight 
jipven to hi^er level itene in computui^ snxdentb' tolaJ score Based on our as5Ujnptionv atnut 
the relationship between score points accorded to an item and/or lask type^ Table 8 shows the 
distribution of item score values by grade level and claim and the proportion of score values that 
can be annbuted to higher level Hems. Here we see a progression of increasing weight being 
given to higher level itemv from approxiiMtely one third of the lotal score value in elementary 
school lo approaching half at ihe high school level Again^ this iiKrease at the high school level 
IS due to the udditiorttl higher level ilents specified for Algebra 2 and Math III. 71ie weight 
through Algebra 1^ Geometry^ Math I. and Math II is similar to the middle school allocations 
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Conclodins Thoushb 

Thjs report shares an ac\aJys)s of the PARCC and Smarter BaJanced blueprints lo project 
the extent to which deeper leajiiin)( will be reflecled m the conscrtia^s summaUve assessment 
systems We jroiuided our analysis in the evidecice^centereU desipi (ECD) process utilized by 
both consortia^ and used the disrribiftion of ilems and tasks at high levels of cngnilise demand 
(DOK3 and DOK4) as indicators of deeper learning. We believe that sevecaJ implicalions and 
caJls for action are evident in our findmgk 

Study Implications 

Mvihodoloit)« A secy different picture of representation of deeper learning emerges when 
one considers the percenlage of the total raw score that is attnbulable to higher level Hems rather 
than metrics based on the number or proportion of items. In examining the former^ only 
of the consortia's assessment Hems and tasks appear to require higher levels of ihinking. but 
based on the analysis of the raw score values associated with these items^ approx imaJely 30-45Vo 
of the lotal possible raw score is allocated to deeper learning Historically^ counts and 
proportions of items ha\e be e n used in considering the alignmenl between standards and 
assessmeni {see, for example, Webb et al . 2005: Yuan & Le, 2012) With new tests, which 
include technology^enhanced and other new formats that vary item score values, the field needs 
to move to new metrics for conducting alignment studies Al this point in the consortia's 
development process, we believe that raw score value provides the belter indicator Down the 
luie. based on operational tests, however, it will also be important lo examuie the weight given 
deeper learning when raw scores are scaled and converted to the scale scores thal are used for 
rzponing and comparison. 

Performance assossmcala ^rudy analyses also make clear the strong relationship between 
the performance assessment componenb and opportunities to assess deeper learning. For bolh 
consortia's assessments and across both English language arts and mathematics, our analysis 
uidicates that the bulk of the EOY exams are focused on relatively lower level items^ while the 
performance task components concentrate on tasks that draw on deeper learning arvd higher 
levels of thinkuig While this relalionship is not a surprise^ it is worth underscoruig as next 
generation tests, m uddilion lo PAdCC* and Smarter Balanced, are produced Without a 
performance assessmeni or extended response componeni, any test will have difficulty 
uicorporating deeper learning goals— or at least the depth of knowledge of tests without a 
performance asse ss m e ni component will require senous scrutiny 

(ocrcascd intrllcrtoal demand* Perhaps the most telling implication of the sludy involves 
the dramatic increase ui intellectual demand that the PAECC and Smarter Balarvced tests will 
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bnn^ dcul wh^t this increase porlerwls for student performance. If these consortia hold to their 
plans^ there is no doubl th£t their tests will he much more derrsanding than those rmst snxdents 
have previously faced. Recall that prior studies have shown that most currenl state tests tbeus 
nearly exclusively on lower levels^ and only those states that include extended cortstructed 
responses (only S of 50 in one xtudy) conaistenlly reached higher levels (Webb^ 2002a« 2002b; 
Yuan & Le, 2012) The low levels of stale tesls^ in fact^ were a muOvaling factor in the federal 
government's substantial investment in PARCC and Smarter Balanced Faced with irKreased 
intellectual challenge^ many snxdents likely will have difficulty performing well and in facl^ 
Kentucky and New YorK the two states that already have transitioned to CCR^ligned tests^ have 
seen lest scores plummet. A drop in scores is to be expected^ and the public needs to be prepared^ 
as many others have rwted 

Students are unlikely to perform well because they previously have not had the opportunity 
to leam and artain the new^ nuire densanding college and career ready ^laisdards that the new 
tests address Importantly, schools arvd teachers have nol had a full opportunity to leam how to 
teach the new slaisdards nor the resourees to do so^ according to surveys of states^ distncls^ and 
teachers (see Rentner, 2013; Rentner & Kober, 2014; Scholastic & the Bill and Meluida Gates 
Fourulalioru 2014) These studies show that siarvdards unplemcnialion is well urvderway in most 
places, and teachers are positive^ but teachers also irsdicate that they and thetr students need 
support to attain success. 

Opportunities for Action 

We believe our firuluigs alao have implications for action moving forward Both rsew 
college and career ready siaivdards and rtew tesis of them require deeper learning. Adapting to 
these new standards is not simply a matter of alignmeni to the surface content details of the new 
standarda^ but mast enable students lo apply^ communicate^ arsd exierxl their krsowledge and skill 
to solve complex problems and meet new siruaiions This is ihe esserve of the higher levels of 
DOK At these levels, studenls musi be able to go beyond the basic concepts and procedures they 
have learned to use arul integrale ihiv knowledge to think critically^ reason with evidence^ and 
explain their thinking Teachers need lo be able to incorporate these higher level demarsds into 
their leaching and to engage studenis in uistrucbonal activities and assessmenLs thal ask srudents 
to extend iheir learning. One step in making this transition involves educating teachers aboul the 
types of ilem prompt that focus on higher levels of DOK. For example^ teachers who want to 
use DOK3 and DOK4 Hems should cortsider uvuig prompis such as those shown in Figure 3. 
(See Hess^ 2013^ for additional guidance and examples ) 
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Lansu^K^ ^rti examples The followm^ Hems from PARCC^s Grade 8 PraeXtee Te$t 
(2014] are used lo illusirale ihe char^^es Ji demarxls aa an aise^cnent Hem moves from DOK2 lo 
DOK2 and DOK4 In each of these items, the studeni is asked lo rud and analyze a 34- 
paragraph pa^ge from the novel Confeui Girl^ whtch focuses or^ the differences m perspeco^e 
of a daughter artO father. Ir^ Figure 4. the item asks the student to answer two quesOons 
coiKeming the meaning of the word sarcasm in the passage. This requires the student to use 
contextual cues within the specified passage to fuzt define and then provide an example of 
sarcasrru both of which require some level of mental processing beyond recall or reproduction. In 
contrast, Figure 5 includes an adaptation of the same item that has less scaffolduig. involves 
some higher level processuig, and requires the studeni lo provide a short explanation. More 
specifically, students are now provided wilh a quote Irom the passage involving sarcasm and 
have to discern what il indicates about the girl's relationship with her father. Finally. Figure 6 
presents an extended version of the same basic achvity where the student must write a mulu- 
paragraph e^y where they have to synthesize the lension in Con/eui Giri with the tension 
presented in a passage Irom a second novel 
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F'pirr 4 DOK/4 ELA nen os Iiom 7 fi«m Onle H bLA I^ui« by fAXCt' (20) 4) 

R«iriA nl ffUQi imp'. >9(e ponoo i.'um.'pwvL.e>tes(&> 0 |l»ti> 

MalbeiDslicsi csampks Tbe followiog items from Jte Srrujter balanced Crude K PncDce 
Tesi (30]4t and Sorting Equations aod IdenCiOes from the Msthemiiucs Asseasirwol Kesource 
Sersice (Malberoalics Assessmeol Resomce Service [MARS], SbeJl Ceolet. University of 
NotCnghani, 3012) are u^ed lo illustrate dtf changes la denuinds os ao asseasioenl item recrves 
trwB UOKl 10 UOK4 


WrJBdawT) an exam pJeorao eqiatun that has; 
a. OnesoluiloQ. 

ti As indajte Qumberarsduijoaa. 
c NassJutlaaa. 


tlguft 7 f BuiUiemsbLs iios ua IsuureaMUias Jloo adspual litaB Suruoij buuBUuu md 
I JeMioer <FAL) }) Fivtai Tail, by MARS, Sbull Calv, Univmy of Nolii^)iBa 120 12). 
Rvcmsttl Ihitnbifi BuipjrHtlHlk>ILv(^nBeeul»diji«okiM|^*tUiiaMll8 
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' 4 • ),t «0 
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r 0-«« i.ci I 

r e ■ >,c • 0 

f'iff/rr S U0K3 ouih^rraucs ilos on Imoi euiauixi* liiss 2042 fium Gnib: B Molhmsun 
rroL'im TeiU by Sfoane BaloaceJ Aisnswni C*oawniuai<20UI RelnmeJ liuBilirf fitot: 
F^talaiM ar9«p^iaK«i4ili>tfv20UlKUt.nmijie*rs«S<>fna(>OiuA>>>}0«l4>hiaaL(«lf 
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Consider Alsequitjcn. 

c •QX^ bx 

Joseph ddiTnstlut if a« ft md cire noT>nesAive1ntegers«then the equmon 
has exactly cnesoitjtlon ferx 

Kitndisaj^eed with jQse|di« claiming that ifo, ftand carenon^negatwe 
integers, then the equation has no solutions for x 

Do you think ether Joseph or Kim are correct? 

[fyou think Joseph ls wrong, exdain the mist Ae and how many solutions you 
think X has in this equation. 


[fyou thtnk Kim ls wrong, explain the mistake and how many solutions you 
think X has In this equation. 


Fifffrt 9 UOKa maiSemdiics item or> brwar^udnortt Iteo dddpte^ (timkOndt^ MaiSemdiiLt 
PrSLi»:e Test, Smarter tielaficad AiMsmaniC<mmtiwn<20U) Xetfiaved from btip / abac 
purtal «imsl 


In oech of these itcns^ students are asked to wort: with e^tamples of linear equations with 
one solution, infinitely many solulions^ or no solutions (CCSS Math.Contenl ^ EE 
.C?8) In the DOK2 vervion of this task, shown in Figure 7. students ha\e lo apply their 
knowledge of linear equations lo write examples of each type of equation. In contrast, the lask 
shown in Figure ^ explores this standard at the ne:tt DOK level by providing a false claim^ and 
then asking the sludent lo idenli^ which of the five given cases disprove the clxim Nol only 
does this problem have multiple answerv (Options 2 and 3), but it requires the sludent to think 
abstTBClJy us they evaluate the claim. Finally, Figure 9 exteruls this problem by asking the student 
to evaluate a second claim lx has one solution, x has no solutions), and then provide written 
arguments us to why the two claims are both incorrecl. 

CCD tools could help. Teachers should consider how they can routinely incorporale such 
deeper learning questions within their ongoing curriculum artd instruction The evidence* 
centered design process, in fact may offer some tools to help teachers do so. Consider the 
products of the test development processes both consortia have used They have established 






daijTis aboul slwlent performance thal thetr ELA and mathcmahcs te^ls ere JilendcU lo eviluale. 
These represent major competencies thal students are e:tpected lo develop^ arsd the big ideas of 
cumculum goals: for example^ '"Students can read closely and arsalytically to comprehend a 
range of increasingly complo literary and informational tests; smdents solve reaJ world 
problerrts^ engaging particularly in the modeling practice^" (see Figure I and Figure 2). lust as 
classroom curriculum lays out a progrevsion of learning targets lhat students will need to 
accomplish to reach these broader goal^ the consortia have defuied the specific assessment 
targets that corvihtute their claints and have created items arvd'or task specifications to measure 
each one Furthermore^ in the case of Smarter Balarvced^ they also specified the DOK levels of 
the specified targets Intended tor Hem writers^ these publicly available models and templates 
could also be used by teachers to create classroom asse^menlv particularly lo integrate higher 
levels of complexity uilo their ongoing instruction and assessment 

Granted^ these specifications are currently very complex and not particularly user l^endly^ 
from a teacher perspective ^Jonelhele8^ if distncls^ schooK and'or teachers take the lime to 
digest lhem« the specificalions for DOKd and DOK4 targets can provide some guidance 

Further^ the perfonrunce task specificatiorts tor both PARCC and Snurter Balanced 
provide general templales for the design of tasks that address deeper learning levels^ particularly 
for ELA. For example, ihe task specifications for Smarter Balarvced indicate that snxdents will be 
exposed to at least two stimuli (in each task)^ consisting r>f one or more passages from a novels 
uiformational amcles^ videos^ etc PARCC ELA perfonrsance tasks emphasise analysis and 
synthesis of two tests 

We do not meart to imply that classroom teaching and learning should be reduced to test 
preparelioru but rather lhat teachers might consider using selected coruortia prompts and models 
ui iheir ongoing classroom assessment and integrate consorlia^type assessment ui support of their 
teaching and learning goals For example^ in probing studenb^ reading and use of evidence they 
should consider including^ as part of the classroom repertoire^ ihe types of questions and prompts 
thal are similar to those thal will be used in the summaiive system CertairOy^ classroom 
performance assessment can arvd should go beyond the bounds of what can be accomplished in a 
one* or two^lay performance task The gerveraJ template of the ELA tasks involves having 
snxdents read multiple sources closely^ anaJyae^ and then synthesize and/or compare them in a 
culminatuig performance. Similarly^ the math tasks have students examuie different data 
representations (e.g . tables^ ec^uations. graphs, etc.)^ arulya an existuig modeL arvd then extend 
on or create a new mcdel or investigation Both of these provide recipes that can be applied 
within a unit^ as a culminating performance^ or an extended research paper. Students could even 
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potcnbally be given ihe oppoctuniry Id select their own topics and soumes. A subsequent report 
will further explore these ideas 

In corsclusion^ the study reported here indicates thal both the PARCC arwj Smarter 
Balanced summ^tive assessment systems^ based on their current blueprui& mart: a significant 
step forward in their demarsds for deeper learning Students and teachers alike are likely to find 
the assevsments very challenging. However^ study results^ and the ECD product on which they 
are basel provide a general roadmap for orienting classroom curriculum^ teachings learnings and 
assessment toward suceess. 
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Table A) 


Smener Suhmccd &L4 Jitmimeirn UaMa Affoia (rroife ^uni, Namher, uin/ Penveioge bv Clai^, Total ooJ ui DOK 3/4 




Cat coreponerA 



PerTunnance usk corepuneni 


Cbirn by gra/le 

Mm iV Items 

‘^Toul 

Mm* OOKJ/A 

Min%lXiKJ/A 

Mean 

%T«ul 

himM 

OOKy>4 

Mir>% 

UOKV4 

bletneniafy CiraJesJ^i 









CItim 1 KeaJing 


59S 

2 

15% 

0 

Q% 

nft 

rVa 

CItiml Wnu% 

10 

24^ 

1 
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3 

SQ% 


!»% 

CItimS Speafcr^luiening 

« 
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0 

0% 

0 

0% 


rVa 

CItimA ReMftli 

6 

15% 

0 

0% 

3 

so% 

2 
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SiAaoial 

41 

100 % 

5 

7% 

» 

lOOS 

s 

S3% 
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Cbim 1 K^ding 
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rt^ 
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10 
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3 
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Table A2 
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ClaJoi by grade 
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